The CEGS N-GRID 2016 Shared Task 1 in Clinical Natural Language Processingfocuses on the de-identification of psychiatric evaluation records. This paperdescribes two participating systems of our team, based on conditional randomfields (CRFs) and long short-term memory networks (LSTMs). A pre-processingmodule was introduced for sentence detection and tokenization beforede-identification. For CRFs, manually extracted rich features were utilized totrain the model. For LSTMs, a character-level bi-directional LSTM network wasapplied to represent tokens and classify tags for each token, following which adecoding layer was stacked to decode the most probable protected healthinformation (PHI) terms. The LSTM-based system attained an i2b2 strictmicro-F_1 measure of 89.86%, which was higher than that of the CRF-basedsystem.
展开▼